Abstract
Introduction:
Fluorescence in situ hybridization (FISH) is a valuable method for quick analysis of relevant copy number alterations or structural rearrangements. However, interpretation of the results is labor-intensive and requires experienced and highly qualified staff. Therefore, we aim to develop an AI-based algorithm for the interpretation of interphase FISH images to support personnel and to speed up routine diagnostics.
Methods:
We used data of FISH probes (all MetaSystems®), frequently utilized in hematologic diagnostics. Two probes detecting copy number alterations on chr5p15 and chr5q31 (5pq) or chr1p32 and chr1q21 (1pq), a dual color dual fusion probe detecting BCR::ABL1 rearrangements (BCRABL), and a break-apart probe detecting KMT2A rearrangements (KMT2A). To predict the number of red (r), green (g), and, if relevant, fusion (f) signals for each cell image a deep convolutional neural network (ResNet18) was applied. In detail, the initial dataset for 1pq contained 372,391 images with 271,014 images showing a normal signal constellation (2g, 2r) and 101,377 images showing aberrant signal constellations, for 5pq we collected 195,298 (normal: n=170,397; 2g, 2r), for KMT2A 95,171 (normal: n=80,851; 2g, 2r, 2f) and for BCRABL 84,483 (normal: n=62,600; 2g, 2r, 0f) cell images. The data included a variety of signal patterns, some occurring only in very small numbers. To ensure robust classifier training, 3 dominant patterns covering approximately 90% of the data, for each probe, were selected for further processing. Due to the predominance of normal signals, the number of normal images was reduced to ensure a balanced class distribution. Additionally, images, not even evaluable for a human examiner, were removed. The amount of images used for training was limited due to balancing and filtering for 1pq (2g, 2r: n=20,935; 2g, 3r: n=21,072; 2g, 4r: n=4,393), for 5pq (2g, 2r: n=13,085; 2g, 1r: n=4,209; 3g, 3r: n=6,550), for KMT2A (2g, 2r, 2f: n=3,804; 2g, 2r, 1f: n=991; 3g,3r, 3f: n=541) and for BCRABL (2g, 2r, 0f: n=5,693; 3g, 3r, 2f: n=2,760; 2g, 2r, 1f: n=468). For evaluation, we used a separate test set, evenly distributed across the 3 signal patterns (1pq: n=5,515, 5pq: n= 5,637, KMT2A: n=669, BCRABL: n=552).
Results:
The model achieved a mean accuracy of 0.93 for 5pq, 0.92 for KMT2A, 0.88 for 1pq, and 0.83 for BCRABL on the test set. Considering that 90% of the cell images could be predicted with the respective accuracy, we can assume that, depending on the probe, 75% (90%·0.83, BCRABL) up to 84% (90%·0.93, 5pq) of the resulting data can be classified correctly. For 5pq, considering only results with a prediction probability of at least 80%, comprising 83% of the probes' overall data, 77% (83%·0.93) of the overall cell images could be classified automatically for the 3 most frequent signal patterns.
As routinely captured FISH images frequently show imperfections, such as superimposed, fragmented or low-intensity signals, we also analyzed the influence of imperfect signals on the classifier performance. Therefore, quality measures were annotated manually for a subset of 5pq (n=3,458) and 1pq (n=3,032). For 5pq 21% and for 1pq 36% of the annotated images showed imperfect (challenging) signal characteristics. Notably, the classifier showed a higher error rate on images with such challenging characteristics.
Conclusions:
The proposed classifier, currently being evaluated in parallel to routine diagnostics, showed high accuracy of up to 0.93. Showing the most confident predictions first and grouping results by their signal patterns helps reduce the amount of manual correction needed: 77% of the cell images for 5pq can be automatically classified correctly. However, for the remaining cell images, insufficient image quality and limited signal pattern coverage so far require significantly more manual review. To improve the classifier and expand signal pattern coverage, new labeled data can continuously be added. This will significantly reduce the previously manual part of the workflow and dramatically shorten turnaround time.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal